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Abstract: 


Web-based distributed modelling architectures are gaining increasing recognition as 
potentially useful tools to build holistic environmental models, combining individual 
components in complex workflows. However, existing web-based modelling 
frameworks currently offer no support for managing uncertainty. On the other hand, 
the rich array of modelling frameworks and simulation tools which support 
uncertainty propagation in complex and chained models typically lack the benefits of 
web based solutions such as ready publication, discoverability and easy access. In this 
article we describe the developments within the UncertWeb project which are 
designed to provide uncertainty support in the context of the proposed ‘Model Web’. 
We give an overview of uncertainty in modelling, review uncertainty management in 
existing modelling frameworks and consider the semantic and interoperability issues 
raised by integrated modelling. We describe the scope and architecture required to 
support uncertainty management as developed in UncertWeb. This includes tools 
which support elicitation, aggregation/disaggregation, visualisation and 
uncertainty/sensitivity analysis. We conclude by highlighting areas that require further 
research and development in UncertWeb, such as model calibration and inference 
within complex environmental models. 


1 Introduction 


The “Model Web” presents a vision of a future where models are exposed as Web 
Services in a flexible distributed architecture (Geller and Turner, 2007, Nativi et al., 
2011). The principle is that models are exposed on the Web and can be discovered, 
combined into complex workflows and executed over a distributed architecture. Such 
a system provides tremendous opportunities to enhance scientific modelling by: 

e improving the integration of different models to address practical questions; 

e increasing the reproducibility and transparency of research by providing clear 

and repeatable provenance information for modelling outputs; 
e allowing more flexible deployment, for example in cloud architectures; 
e facilitating the discovery and reuse of model components and code. 


The Model Web is in all important respects a developing realisation of the ‘Web 
Service Modeling Framework’ conceptualised by Fensel and Bussler (2002) and the 
four key elements they identify as essential (ontologies, goal repositories, web 
services descriptions and mediators) map very closely to the tools described later in 
this article. 


A practical implementation of the Model Web concept requires interoperability of 
models and information models in an open system setting. This raises several 
important challenges. The first challenge is semantic; for multi-disciplinary models to 
successfully interact in robust systems modelling, there must be unambiguous 
definitions of all model inputs and outputs, and the scales on which these are 
measured, since different science domains may use different terms for the same 
phenomenon, or the same terms for different phenomena. Villa et al. (2009) describe 
the consequent need for some form of ‘declarative modelling’, and review recent 
responses to this problem in the field of environmental modelling. Such vital semantic 
mapping issues have been addressed in the context of distributed geospatial modelling 
by the SWING project! and its follow-up, ENVISION’, which is currently developing 
semantic annotation, harvesting and ontology management tools to support the 
adaptive chaining required by the Model Web (Janowicz et al., 2010). 


In this paper we focus on another issue that faces all modelling frameworks including 
the Model Web; that of uncertainty management in an era of increasing access to both 
data and models. The data are typically Earth observations taken from both satellite 
and in situ systems, while the models range in complexity from empirical statistical 
models, through box or lumped conceptual models to fully distributed spatio-temporal 
simulators such as global climate models. Whatever their complexity, these models 
have several common features: they all read inputs (for convenience, we consider 
model parameters here also as model input), carry out computations or other 
manipulation on those inputs, and produce outputs. Model inputs may be observed or 
measured values, or they may be outputs from other models. In either case, model 
inputs are subject to errors, and these errors will contribute to the uncertainty of the 
model output. Quantifying this error or uncertainty is called error propagation 
(Heuvelink, 1998), as the error in model input is propagated into the error in model 
output. Additional uncertainty will be contributed by the modelling process itself, and 
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we refer to this henceforth as model structure uncertainty (Beck, 2005, Refsgaard et 
al. 2006a). The components of uncertainty in model results are more fully addressed 
in later sections — at this stage in the discussion, the important issues are their 
existence and impact. In the following discussion, we will use the word ‘sensor’ in a 
broad sense, to indicate any agent which is capable of recording an observation of the 
real world. 


Model outputs and observational data are increasingly being used in policy and 
decision making, where the use of incomplete information is a risky undertaking, 
unless some attempt is made to account for and quantify the impacts of gaps in 
knowledge (Evans, 2008). Some of the most pressing issues facing society, such as 
climate change (Stainforth et al., 2006) and its economic impacts (Roughgarden and 
Schneider, 1999), sustainable development (Levy et al., 2000, DeLara and Marinet, 
2009) and future energy supplies (Jebaraj and Iniyan, 2006) are subject to significant 
uncertainties which seriously affect the development of strategy, as well as causing 
serious scientific debate as to the value and purpose of modelling (e.g., Dessai et al, 
2009). In general where there is a decision with a specific cost on taking some 
remedial action and a loss associated with taking no action (Berger, 1985), and where 
the costs and losses have significant non-linear dependency on model outcomes, often 
with critical thresholds, knowing the uncertainty in the model predictions can change 
the decision taken. . 


There is an increasing recognition of the importance of quantifying uncertainty in 
modelling (e.g. Geza et al, 2009; Allen et al, 2007; Clancy et al, 2010; Feyen and 
Caers, 2006; Cheng and Sandu, 2009). However, the treatment of uncertainty within 
modelling frameworks such as the proposed Model Web is not straightforward: 
firstly, many frameworks which can currently handle components published as 
services do not have strong or consistent support for propagating or analysing 
uncertainties, and secondly, the distributed environment introduces a number of new 
challenges. In service, grid- and cloud-computing based modelling frameworks, 
model components may be discovered and composed in flexible and potentially 
complex workflows. However, if this is done without careful description of 
uncertainty and attention to the quality of (intermediate) model outputs, then the final 
model output may be too inaccurate for the intended use; and, more importantly, the 
user may be unaware of this fact. It is therefore essential that the reliability of 
intermediate and final results is quantified and communicated to the end user. 
Extending the Model Web to handle and convey uncertainty information in this way 
is a great challenge. 


Complex environmental and geospatial models have specific issues when it comes to 
uncertainty handling. These include: 

1. large amounts of observational and other data do not currently have reliable 
uncertainty information associated with them; 

2. most existing models used across the geosciences and beyond do not have 
reliable information about their model uncertainties, or model structure 
uncertainty, available; 

3. many of the phenomena of interest are spatial, temporal or spatio-temporal in 
nature, are measured and expressed at various spatial and temporal scales and 
often have strong correlations imposed by the physics and dynamics of the 
natural systems, all of which cause difficulties when evaluating uncertainty; 


4. representing spatially and temporally distributed systems typically requires 
large numbers of variables, and capturing the uncertainties and correlations in 
these variables is computationally demanding; 

5. most models have non-linear responses to their inputs, and thus can have 
complex probability distributions over their outputs, even for simple 
parametric input probability distributions; 

6. analytic results will be the exception rather than the rule, and thus Monte 
Carlo methods, with their associated computational expense, will be the 
default uncertainty propagation mechanism, implying limitation of the 
proposed solution to computationally cheap models or situations with large 
computational resources. 


The above issues are challenging, but must be addressed in order to make progress 
and ensure that the Model Web, or indeed any modelling framework, is of practical 
use. Additional tools are needed to support the practical usage of uncertainty 
management, for example to address the current lack of uncertainty information, to 
reduce the computational demands, to manage the issue of changes in spatial and 
temporal scale and to communicate the uncertain outputs of the modelling workflows. 


In this article we describe a coherent framework for extending the Model Web 
concept of integrated modelling while also taking into account uncertainties. The 
framework described will be realised by the UncertWeb project 
(http://uncertweb.org). 


The paper is organised as follows. Section 2 introduces the existing approaches to 
managing uncertainty in modelling, setting the context for the later work. Section 3 
describes the practical issues that arise in quantifying and analysing the propagation 
of uncertainty. Section 4 reviews existing modelling frameworks with a focus on their 
ability to support uncertainty management and to interoperate with the Model Web. 
The solutions proposed within the UncertWeb project are described in Section 5, 
including the key tools that enable users to exploit the “uncertainty enabled Model 
Web” effectively. The article concludes with a discussion of the likely impact of the 
“uncertainty enabled Model Web” on future scientific activities and highlights the 
areas that require further research. 


2 Uncertainty in modelling 


“All models are wrong; some are useful.” (Box and Draper, 1987). This statement 
originally referred to statistical models but is equally true of physical-deterministic 
models of complex environmental systems. 


Uncertainty is a challenging notion for scientists who have often been trained 
following a mechanistic, deterministic modelling paradigm. Yet all models are 
abstractions and simplifications of the complex reality they aim to represent. In this 
work we do not discuss the various types of, and basis for, uncertainty identified in 
more philosophical research (e.g., Smets, 1991; Dawid, 2004; Sigel et al., 2010), but 
rather focus on an operational / practical approach to uncertainty. Almost all 
uncertainty we seek to address within this work might be characterised as epistemic 
uncertainty arising from a lack of knowledge, rather than intrinsic randomness, or lack 
of precision in semantics. 


2.1 Origins of uncertainty 


While several mathematical and computational frameworks exist for working under 
uncertainty or incomplete knowledge, we argue that, practically, a subjective 
Bayesian approach (Jaynes, 2003) is the most natural choice when working with 
models, observations and their relation to reality (see also Dawid (2004) for an 
interesting discussion on this issue). Management of uncertainty is essential when 
working with models of real systems (Brown, 2010). The main uncertainties arise 
from uncertainties on model inputs (which are often either direct observations from 
sensors or data derived from observations using other models), and from model 
structure uncertainty. Uncertainties on observations or derived data can be identified 
with: 

e measurement uncertainty — the intrinsic uncertainty in a given measurement, 
due to noise in the electronics of the sensor system (Desenfant and Priel, 2006); 

e representativity uncertainty — additional uncertainty arising from the difference 
between the spatial and temporal sampling footprint of the sensor and the 
defined spatial and temporal representation of reality (Frehlich, 2011); 

e sensor model uncertainty — incomplete knowledge of the sensor, or of the 
forward observation model which maps the measured quantity to the target 
variable (Agarwal, 1998); 

e transmission uncertainty — possible artefacts and processing errors introduced 
by the computer systems and electronics that carry and process the sensor 
observations (Bullen et al, 2003). 


This list is not exhaustive and rarely are all sources of uncertainty known. A more 
complete discussion of observational errors can be found in Hill and Tiedeman (2007; 
Chapter 3). Often, information about the uncertainty of an observation can only be 
determined a posteriori, using validation campaigns. In such a setting the overall 
uncertainty with respect to reality is assessed using carefully quality controlled 
‘reference observations’ which are often assumed to have negligible error, or using 
techniques such as triple collocation (Stoffelen, 1998). Using validation data it is 
possible to estimate the overall uncertainty. If some data are retained for testing only, 
the uncertainty judgements made on the observations can also be validated (Gneiting 
et al., 2007). 


Model uncertainty (Allen et al., 2002; Brown and Heuvelink, 2005; Lindenschmidt et 
al., 2007; Refsgaard et al., 2007; Goldstein and Rougier, 2009; Park et al., 2010; 
Smith and Marshall, 2010) is even more complex and can arise from a range of causes 
including: 

e mechanism / structural uncertainty — it is impossible to include all mechanisms 
and physical, chemical, biological or human processes that act on reality in the 
model — they must be simplified and prioritised (Refsgaard et al., 2006a); 

e representation uncertainty — for spatial, temporal and spatio-temporal models it 
is necessary to map the space, time and space-time fields of the real system to 
the model variables, typically by discretisation or projection onto some basis 
such as a grid, set of elements or harmonic expansion. This introduces 
uncertainty due to the finite dimensional nature of the discrete representation 
(Frehlich, 2011); 


e parameter uncertainty — many inputs to a model cannot be directly observed, 
and we tend to think about these as being parameters in the models, whose 
values are often empirically determined but essentially unknown (Aster et al. 
2005; Tarantola, 2005; Gallagher and Doherty, 2007); 

e numerical uncertainty — non-trivial models will require some sort of solver, 
often integrating differential or difference equations forward in time, and these 
together with the finite precision representation on digital computers will 
introduce additional uncertainty (Ataie-Ashtiani and Hosseini, 2005; Clark and 
Kavetski, 2010). 


As with observational uncertainty, this list is not exhaustive, and is missing a 
description of the now-notorious ‘unknown unknowns’ (Meyers, 1969, Jaher, 1970, 
Kerwin, 1993). Such issues are very challenging to deal with in a quantitative 
framework, but could be important in some complex models, such as Earth System 
Models, where human activity for example is very challenging to model. 
Specification of model structure uncertainty is an open and challenging research 
problem, and many approaches are being pursued, from the more philosophical 
reification approach (Goldstein and Rougier, 2009), through approaches based on 
statistical modelling and inference (e.g. Kuczera et al, 2006) to generative approaches 
which systematically try to simplify more complex models (Cullen and Frey, 1999). 


Figure 1. A schematic representation of the relation between people, reality, models and 
observations. 


As shown in Figure 1, a unified framework is needed to integrate observations, 
models and reality, with the users (people) constructing both the simulators for the 
systems (reality), and the sensors that observe the system. The users also play a 
critical role in the above scheme by selecting and channelling appropriate 
observations to simulators. As discussed above, the processes of modelling and 
observation are both subject to uncertainties. Typically the observations will be used 
in a process of calibration to improve the simulators, so that those simulators produce 
a better fit to the observations of reality. In this process, it is important to consider the 
representativity of those observations, and to use sensible data splitting techniques for 
model validation, in order to identify and avoid over-fitting. However, despite 
calibration and careful formulation, all simulators of real environmental systems 
retain non-trivial uncertainties, as discussed above. The aims of modelling can be 
manifold, from improving understanding of the system to more practical questions of 
prediction or forecasting. When models (or simulators) are used to inform decisions it 
is critical that any uncertainties in the model predictions on which the decisions are 
based are taken into account, because these decisions will affect reality and this in 
turn will affect people. 


Probabilistic uncertainty may be measured or estimated and represented in a number 
of different ways (O’Hagan, 2011), depending on the nature of the phenomenon and 
the instrumentation available. UncertWeb attempts to recognise a range of 
descriptions of probabilistic uncertainty, and to support their practical use. The most 
complete description of a random variable is the probability density function (e.g. 
Gelman et al, 2003). In practice, observed patterns are more commonly fitted to a 
known class of probability distribution functions (e.g., Normal, Poisson) or 


summarised using statistics (e.g., moments such as mean, variance and skewness). In 
other contexts, single or multiple realisations of the variable may be of most value, or 
may be all that are available. This is further discussed in Section 5.1 in the context of 
the UncertWeb encoding for uncertainty information, UncertML. 


3 Practical uncertainty management in modelling 
frameworks 


To quantify a model’s output uncertainties one can either modify the model to allow it 
to propagate uncertainty itself, or wrap it in an environment that manages uncertainty. 
Modifying the model, i.e., the computer code that does the numerical calculations, 
requires that the model be made to understand input uncertainty characterizations, 
carry out computations taking care of these uncertainties, add uncertainties due to 
parameter estimation, and write output uncertainty characterization in addition to the 
model outputs. Although this approach might be computationally the most efficient, 
especially when all or part of the uncertainty can be propagated analytically, it also 
raises a number of problems: (1) it requires access to the source code and permission 
to modify it; (ii) it requires deep knowledge of the model source code, and testing of 
the modifications, (iii) the modified model may no longer be identified under the 
same name as the original one. 


A more practical approach is to keep the model as it is, and wrap it with an 
application (environment) that takes care of the uncertainty characterizations. The 
approach used here a simple Monte Carlo simulation, as follows: 

1. an application wrapped around the model reads the uncertainty distribution on 
an input (and/or model parameter); 

2. if this distribution is not characterized as a sample, the application draws a 
sample of size n from this distribution; 

3. The application runs the model n times with each of the sample elements as 
model input, and collects the n model outputs, or realisations, that characterize 
the output probability distribution; 

4. The application can then convert the model output sample to summary 
statistics such as the mean, variance, or quantiles to approximate confidence 
intervals as required for subsequent processing. 


The issue of model error, or structure uncertainty, is less easily handled with such a 
wrapper framework, since this really only allows one to propagate uncertainty on 
inputs such as model parameters and initial conditions. In theory it could be possible 
to include model structure uncertainty in the wrapper framework as an additive or 
multiplicative noise component, which could be simulated and added to the 
realisations generated at step 3. As noted by a reviewer, the UncertWeb framework, 
although not specifically designed with multi-model ensembles in mind, could 
facilitate the creation of such ensembles if a range of competing models for a given 
system were all deployed within the same framework. Such multi-model ensembles 
are often used to assess model structural uncertainty, and this could be an additional 
benefit of exposing models on the web in the manner suggested in this paper. 


The Model Web blueprint requires that such a wrapper application, which we could 
call an ‘uncertainty-enabler’, needs to be implemented as an interoperable Web 
Service using open standards. The benefits of this are huge: (1) data sources can 
directly be retrieved from the data source provider, (ii) data sources, models, and the 
model wrapper can all run on different platforms, under different operating systems, 
and may partially run on computer clusters, in the cloud or on mainframes, (iii) Monte 
Carlo samples can be run in parallel, if the available computing infrastructure allows 
this, and (iv) data or model resources can be exchanged, or re-implemented on 
different systems without significant change to the overall setup. 


Setting up such a system as a Web Service of course requires that a Web client is 
available to run it. This client can be an interactive tool such as a workflow modeller 
that allows orchestration and execution of the workflow and that runs in a Web 
browser, or it can be another model wrapper that takes the currently modelled 
workflow as a component (i.e, as a “model”) in a larger model composition exercise, 
to realise a further step of model chaining and integration. 


4 Existing frameworks for model-coupling 


The Model Web is just one of a number of model-coupling approaches of varying 
maturity. The differences between a number of these approaches have been 
summarised in Jagers (2010), who notes that conflicting priorities (e.g., performance, 
ease of use and generality) have, paradoxically, led to a surprisingly wide variety of 
alternative solutions to the interoperation challenge. It is particularly useful to note 
that when choosing a framework for existing models, there is often a trade-off in 
convenience - for example, the effort required to standardise the interface of legacy 
code can be substantial, but the resulting usability of the model can be greatly 
increased, since it may then be easily wrapped and combined with other models. 


Elements in the orchestration and composition of environmental models can be 
broadly classified into: 

e standard languages and interfaces; 

e workflow and orchestration tools; 

e frameworks and framework generators. 
The following paragraphs set the scene by describing some commonly-used examples 
which illustrate the state of the art. It will be seen that a number of these do not fit 
neatly into the three categories above, and some integrated systems address multiple 
purposes. The interaction between coupling approaches is very important; since the 
ultimate aim is often to ensure re-use of models, deciding on an interface or language 
can be critical for the model developer. For environmental models, spatial data 
models in particular may impose restrictions on the combination of models and the 
mapping of outputs to inputs. These ‘interoperability’ characteristics are further 
investigated in Table 1. We also consider technologies which handle uncertainty in 
integrated models. These include packages for model calibration, parameter 
estimation and sensitivity / uncertainty analysis, and are listed, with application 
examples, in Table 2. 


The languages / interfaces category includes BPEL’ (Business Process Execution 
Language), a widely-used programming language focussed on message transmission 
between systems via Web Services, mediated using Web Services Description 
Language (WSDL) documents. BPEL includes validation and control flow elements 
which can be interpreted by a variety of engines to execute a process flow. Other 
initiatives such as OpenMI* concentrate less on the framework within which modules 
are arranged, and more on standardising the interface which each model presents to 
the world, so that the requirements and limitations of each are clear. Recent 
adaptations to the OpenMI standard display particular attention to these common 
issues of interoperability: for example, allowing more abstract inputs and outputs, and 
permitting inputs which have no specific time frame, thus opening up the tools for use 
with non-time stepping models. The Common Component Architecture (CCA) 5 is 
another component standard which appeals to scientific modellers largely because of 
its support for multi-dimensional data arrays and parallelisation. In the data mining 
community, the XML-based PMML (Predictive Model Markup Language) is 
commonly used to summarise and exchange complete summaries of models complete 
with defined inputs and outputs. CSIRO’s ICMS° (Interactive Component Modelling 
system) is considered under this ‘languages’ heading, since its primary focus is on 
allowing the development of executable model components in a system-specific, C- 
like language called MickL. 


Workflow tools such as Taverna’, Kepler’, Vis Trails’ and Trident!” provide user- 
friendly GUIs within which modular processing or data entities can be arranged, 
inputs mapped to outputs and control / break conditions defined. The resulting 
workflow chains can be stored, published’, shared and exposed as encapsulated 
models, while the component models themselves must simply expose a WSDL 
document describing each process, and its inputs and outputs. Thus these tools can be 
used as engines for interacting with BPEL workflows, as well as, for example, 
compiled C code or R scripts. Another, more specific, orchestration tool, the Open 
Modelling Engine (Rizzoli et al., 1998) can be used to schedule MickL components 
like those mentioned in the paragraph above. 


Finally, there are also a host of more or less discipline-specific frameworks for 
combining models and controlling their execution, such as Delta Shell'?, FRAMES”? 
(Framework for Risk Analysis of Multi-media Environmental Systems), SME (Spatial 
Modelling Environment)", Tarsier", ICMS (the Integrated Component Modelling 
System)’®, Fluid Earth'’, TIME'® (The Invisible Modelling Environment), MCT 
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(Model Coupling Toolkit), ESMF” (Earth System Modelling Framework), OASIS” 
(Ocean Atmosphere Sea Ice Soil), CESM” (Community Earth System Model) and O- 
PALM”. Many of these frameworks include standard modules for applications such 
as hydrological or climate modelling. A recent development is the generic Bespoke 
Framework Generator” (Armstrong et al., 2009) which generates wrappers and 
control code wrappers for model sequences based on standardised model metadata 
which is collected in XML schemata. The BFG has been used to generate an updated 
version of the GENIE” framework for Earth System Modelling, and is being used for 
the UK Met Office’s FLUME” (Flexible Unified Model Environment). 


4.1 Interoperability in modelling frameworks 


Interoperability between the approaches described above is variable, and some aspects 
(often strongly influenced by the discipline from which each approach arose) are 
summarised briefly in Table 1. We give particular attention to the spatial data models 
employed by each approach, since these are of particular importance for many 
environmental models and datasets. To quote Mattot et al. (2009) *... the irony in 
design of both model evaluation tools and integrated modeling systems is that 
everyone wants to define the ‘standard’ and be the integrative framework’. However, 
there have been considerable moves within the modelling community towards 
interoperability and agreement on common interfaces at least, and Table 1 illustrates 
how this has expanded the usability of coupling technologies. 


In terms of model code, almost all frameworks described here support the use of 
compiled C, but other languages such as Java and Fortran are less universally 
supported and often must be wrapped before use. Jagers (2010) presents an extremely 
useful summary of the capabilities of some of these technologies, particularly with 
regard to their ‘code invasiveness’ (1.e., the requirements that each imposes for 
rewriting legacy code) and their capacity to support high-performance computing. 
Rahman et al., (2005) describe a typical choice between rewriting and wrapping, 
where the decision can depend largely on the complexity and current performance of 
algorithm code. In their example, some elements of the model were rewritten in C#, 
while others were simply recompiled and wrapped as Windows DLLs. The generation 
of interoperable, standardised wrappers is a huge design issue for the ModelWeb, 
where the interfaces to models must be Web Service interfaces. 


The welcome move towards standardised model interfaces again raises a paradox: the 
more widely applicable the interfaces become, allowing new models to easily plug 
into frameworks, the more abstract the descriptions of model inputs and outputs 
become, and the more the semantics specific to the discipline from which the model 
originates are hidden from the outside world. Thus a model with an OpenMI- 
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conformant interface may accept an ‘object’ which might be a multi-dimensional 
raster grid, a set of point observations or a time series. The nature and appropriateness 
of the object might be determined only when the data is parsed, so that the 
responsibility for finding and linking suitable inputs and outputs falls on the user. 


Table 1: A variety of coupling technologies of varying granularity, with information on 
the specificity of their spatial data models, and their capacity for interoperating with other 


toolkits and technologies. 


























Approach | Language/s and service Spatial data model Interoperates with... 
interfaces 

Kepler Java, PMML, WSDL, BPEL, None: responsibility for R and Matlab, ImageJ, 
wrapped C/Fortran assessing the appropriateness GRASS” and GDAL” for GIS 

Taverna Java, WSDL, REST, Beanshell, | 9f ge devòlves:to the R 
Rshell, Soaplab moga, 

Trident C#, Java .NET, WSDL, BPEL 

Vis Trails Python, WSDL Quantum GIS 

SME STELLA” or SMML” Frames of Points which may Python and CCA (through a 


(translated to C++ for 


also represent grids or 


Java-based portal) 








execution) network graphs 
ESMF Fortran, C++ Raster grids CCA 
MCT Fortran Raster grids CCA 





Delta Shell | OpenMI 


As below — internally, multi- 
dimensional results can be 
stored as NetCDF. Spatial 
vector data model is closely 
based on OGC features. 


GDAL, Google Earth (through 
KML export) 





OpenMI C# or Java interfaces, wrapped 


C/Fortran 


No explicit description: an 
input/output ‘Object’ may 
represent raster or vector data 


Fluid Earth (see below) and 
Delta Shell (see above) 





FRAMES Native C interface with 
bindings for Java, .NET, 
Fortran, VB6 and Python 


None: responsibility for 
assessing the appropriateness 
of inputs devolves to the 
model 


PEST (as a tightly-coupled 
module) 














Fluid Earth | OpenMI-wrapped models As above OpenMI (as a coupling 
mechanism) 
TIME .NET, wrapped C/Fortran Raster grids, vector data, 
networks and time series. 
Tarsier C++ Raster grids, networks, points 








and time series 








If information on the nature of model inputs is also published as standardised 
metadata or as an optional part of the model interface, the task of orchestration is far 
easier, and may even be automated. This is where the issue of ontology and semantics 
becomes important to supplement the abstraction enforced by technical 
interoperability, and to help in achieving context independence without losing access 
to vital domain knowledge. Rahman et al. (2004) note the importance of metadata 
which describes the “properties and capabilities of ..[executable].. components’, and 
specifically chose .NET introspection as the mechanism by which the TIME 
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framework would derive this information at runtime — a successful approach which 
has supported the development of a number of hydrological decision support systems 
(e.g., Argent et al., 2009), but one which places a language restriction on the 
developer. The ICMS, by contrast, derives this metadata at the point of model 
compilation, and stores it in a system-specific form. The XML metadata supplied to 
the BFG are used in a similar way, and are currently generated by hand, though a GUI 
to help with this task is planned. In all of these approaches, of course, reflection by 
the user as to the nature of their model and its requirements is necessary and indeed 
important; it is simply required at a different stage of the process and the information 
is encoded in a different way. The FRAMES environment tackles the semantic 
challenge by imposing a ‘design by contract’ approach where users subscribe to (or 
create) a domain ontology containing definitions of what models may produce or 
consume. Models conformant to the supplied dictionary may then be linked through a 
‘contract’. A proposal to standardise uncertainty information via similar dictionaries 
is described in Section 5.1. 


Of particular interest in the Model Web context are the growing efforts to adapt the 
above model-coupling tools to comply with or use the OGC Web Processing Service 
(WPS) standard. (e.g., Guru et al., 2009, Jones et al., 2010, Pratt et al., 2010), which 
raise many pertinent questions about the abstract nature of OGC service 
specifications. Essentially, the flexibility of a WPS in accepting or producing any 
data, in more or less any format, can be problematic when a user who is querying the 
capabilities and interrogating the processes of that WPS lacks the semantic tools to 
understand the nature of the inputs and outputs. In these instances, profiling or 
restriction of the WPS so that it more clearly describes its limitations is extremely 
helpful in identifying whether that WPS really is a valid candidate for chaining with 
another. This restriction is most usually applied through reference to XML application 
schemata, and is fully anticipated in the WPS specification, which states “WPS can be 
thought of as an abstract model of a Web Service, for which profiles need to be 
developed to support use, and standardized to support interoperability”. While this 
requirement is logical, it weakens the case for OGC services as ‘interoperable’ by 
imposing a requirement on users to develop specific clients to consume or chain these 
profiled WPS. If a model is to be usable within the Model Web, its interface must 
either conform to an agreed profile, or must be discovered and consumed by a higher- 
level ‘broker’ which has the capacity to translate the published model metadata into a 
usable format. A proposed solution (the CaaS) is described in Section 5.2. 


Naturally, there is some metadata about models which can never be used in a fully 
automatic way. For example, information on the lineage and previous uses of the 
model, or on the circumstances and contexts to which it is best suited, may in the 
future be encoded in some sort of trust metric, but currently rely on a textual 
description and the judgement of the user. However, much of the necessary 
information on what a model will accept (even complex details such as required data 
granularity and valid geographical range) can be published using schemata and 
dictionaries, providing that these are widely accepted and available. 


4.2 Exchanging uncertainty information between models 


Interface abstraction and the profiling challenge are especially relevant when a model 
workflow is used to handle and propagate uncertainty. As described in Section 2, 
there are diverse sources of uncertainty which, even within a probabilistic framework, 
can be measured and recorded using different numerical summaries and metrics. 
When working in a multi-disciplinary context, the ‘traditional’ representations of 
uncertainty may also vary: for example, the Root-Mean-Squared Error values 
commonly attached to digital elevation models involve an implicit assumption that the 
error in elevation is symmetric (typically Normal) and identically distributed in space, 
while 95% confidence limits given by a sensor manufacturer for a measuring 
instrument may give no indication of how the expected error is distributed within that 
range, or whether it is biased or bounded. Some statistical summaries can be easily 
combined - for example, different probability density functions may be combined 
hierarchically in models to generate conditional probability density functions, and this 
approach underpins Bayesian analysis. Often, however, one representation of 
uncertainty (e.g., a sample) will require explicit transformation in order to be 
combined with another (e.g., a parameterised probability distribution) in direct 
computation. 


A simple example of such a need for transformation in the Model Web would be as 
follows: A climate change scenario model is used to generate predictions of 
temperature for pixels in a geographical area. Each pixel is assigned an expected 
temperature with a statistical range in the form of a parameterised probability 
distribution function — this is assumed to be Normal, and so the outputs of the climate 
model take the form of a mean and variance for each pixel. These two output maps 
are to be fed to a second model, along with other maps to run 1000 agent-based 
simulations of animal dispersal. However, the second model requires a static 
temperature map as the base for each run, so a plausible realisation of temperature, 
with realistic spatial autocorrelation, must be generated from the statistical summary 
values, through some intermediate transformation service. Some of the aspects of the 
first model’s outputs are implicitly described within the data (for example, 
geographical range, projection and resolution are easily extracted from a GML 
document or netCDF file). If the second model clearly advertises geographical / 
resolution requirements, this allows a user to assess or even automatically identify the 
need for resampling, aggregation or reprojection. Other attributes of the outputs, (such 
as lineage information on the climate model or the nature of the estimated 
uncertainty), require similar standard encodings. In particular, it must be clear that the 
type of uncertainty produced by the first model (a probability distribution function) is 
not the same as that required by the second model (a set of single-valued realisations 
for each pixel). 


It can be seen from the above example that the requirement for models to fully 
describe themselves is of even more importance when it comes to handling 
uncertainty and propagating it through a workflow. While many models do not 
inherently handle uncertainty information on inputs, they can still be ‘uncertainty- 
enabled’ within a framework by repeated calls which effectively allow a Monte Carlo 
simulation or stochastic sensitivity analysis as described in Section 3. 


4.3 ‘Uncertainty-enabled’ models - current examples 


Many simulation software packages exist which may be used to ‘uncertainty-enable’ 
existing models. Mattot et al (2009) present a very useful review of 65 different tools 
for simulation, calibration, optimisation and model evaluation, and a number of the 
most widely used or pertinent, with example applications, are listed in Table 2. A 
further list of freely available software tools for the development of uncertainty 
management applications is given in Table 3. The most successful and widely-used 
‘uncertainty-enablers’ are model-independent, and sometime platform-independent; 
this flexibility is generally achieved by a reliance on ASCII-formatted inputs to and 
outputs from the wrapped models. This tradeoff between flexibility and restriction is 
an equally important theme for the Model Web, as discussed in section 4.1. The 
process of “uncertainty-enabling’ can demand significant effort on the part of the user 
(for example, through the generation of template and instruction files to link and feed 
models, or through the conversion of binary model outputs to ASCII formats) but in 
other cases, simulation tools are made easily accessible as modules within existing 
frameworks. If consideration of uncertainty and validation of linked models is to 
become routine, especially among non-expert users, this access to powerful 
simulation tools which can wrap and ‘uncertainty-enable’ models must be 
strengthened and improved. Recent moves in this direction include the integration of 
the PEST parameter estimation toolkit?’ into FRAMES (Castleton and Meyer, 2009). 
This is an important issue for the Model Web, and a clear opportunity to build on the 
interest and experience within the wider community of integrated modellers. 
Visualisation of the spatio-temporal uncertainty of outputs (more fully discussed in 
Section 5.3.4) is available to varying extents in these solutions, and is also extremely 
important in the presentation and use of propagated uncertainties. 


In many of the above examples, models designed to accept single input values at each 
observation point are wrapped and run multiple times with stochastically-generated 
input values derived from the uncertainty specification on these inputs. In other 
words, though multiple outputs from these models may be summarised to produce 
uncertainty information such as probability distributions, they do not explicitly accept 
such uncertainty information on the inputs. Other models, in contrast, may accept 
statistical summaries such as standard deviations, ranges or quantiles and use this 
uncertainty information internally in their calculations. In other words, provided that 
they can understand the form in which the input uncertainty is encoded, these models 
are already “uncertainty-enabled’. One such example is the INTAMAP interpolation 
Web Service”), which can consume point observations whose uncertainty is 
represented as parameter values for well-known distributions, and produce 
interpolated maps of mean and variance using an algorithm most suited to the nature 
of the input uncertainty (Pebesma et al., 2011). In an alternative approach which 
combines numerical and semantic elements of uncertainty, the ‘EcoPath’ model 
commonly used in fishery management planning (Pauly et al, 2000) elicits estimates 
of input uncertainty from users through the assignment of ‘pedigrees’, recording the 
lineage of the data (a guess, a global estimate, a measurement) as well as allowing the 


30 http://www.pesthomepage.org/home.php last accessed 03/10/2011 
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user to select confidence intervals. The two elements of the uncertainty are combined 
to perform potentially complex analyses, for example using the Bayesian ‘Ecoranger’ 
module. 


Many of the existing frameworks and model implementations tend to address very 
specific application domains and focus largely on model calibration and parameter 
estimation; this makes them highly valuable for handling specialised and complex 
data and algorithms within a research field, but can raise challenges for 
multidisciplinary model chaining. A number of the existing approaches support 
uncertainty propagation through Monte Carlo methods, which have proven value for 
sensitivity and uncertainty analysis. In the following section we develop a generic 
framework for managing uncertainty in the Model Web context, informed by the 
lessons learnt in previous work. 


Table 2: A selection of existing simulation software packages (or modules within frameworks) which may be used to uncertainty-enable existing 

















models. 
Name Reference(s) or web sites Comments Examples of use 
PEST Doherty, (2004) Powerful calibration, regularization and 1 Castleton and Meyer (2009); integration of PEST into FRAMES. 
(Parameter optimization toolkit. Implements a variety Dausman et al.(2010); testing of alternative hypotheses for the wastewater 
EStimation of parameter estimation methods, and null- | plume movement, by highly-parallellised calibration of candidate models and 
Toolkit) space Monte-Carlo approaches for linear generation of a subset of ‘superparameters’. 
and non-linear analysis of uncertainty, 1 Doherty & Hunt (2009); describe statistics (calculated using PEST) to 
parameter identifiability and error variance. | summarise the extent to which each parameter of a model can be identified, 
and the extent to which the calibration process can improve on the estimate 
based on prior expert knowledge. 
UCODE Poeter et al. (2005) Non-linear parameter estimation code — 1 Kelson et al. (2002); application to a mine hydrological context, resulting in 
like OSTRICH and PEST, generates a highly simplified model with equivalent predictive power. 
confidence intervals and other statistics 1 Foglia et al., (2009); refinement of parameters from catchment-scale 
through model inversion, estimates for calibration of distributed hydrological models. 
OSTRICH http://www.civil.uwaterloo.ca/ A versatile tool incorporating a diverse set 1Rabideau et al. (2005); calibration of multiple AEM groundwater flow 
Ismatott/Ostrich/OstrichMain.html of algorithms for calibration, optimization models, with particular attention to effects of model precision and observation 
and computation of statistics such as location. 
parameter correlation / sensitivity, and Mattot & Rabideau (2008); describe a method (in OSTRICH 1.8) for 
observation influence. simultaneous calibration of equally plausible models by adaptive weighting 
and mapping of parameters between reference and surrogate models. 
UNCSAM Janssen et al., (1994) Can do model emulation; does not cope 1 Bärlund and Tattari (2001); application to the ICECREAM model of field 


with spatially and/or temporally correlated 
variables 


phosphorus loss. 





SME (Spatial 


http://www.uvm.edu/giee/IDEAS/ 


Voinov et al., (1999); ecological-economic spatial process modelling. 

















Modelling sme/docs/SME_guide.html Villa and Costanza, (2000); spatial agent-based modeling (enabled by linking 
Environment) SME with the SWARM agent-based modeling toolkit °°) 
Deal and Schunk (2004); scenario modelling of urban sprawl and its effects; 
particular attention to the importance of model validation. 
GENIE-1 Holden et al., (2010) Emulation based on ensemble modelling. 1 Holden et al (2010); climate prediction 
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FRAMES http://mepas.pnl. gov/framesv1/ Monte Carlo analysis and Latin hypercube Babendreier and Castleton (2002); parallelised use in the 3MRA pollutant 
Sensitivity/ sum3ug.stm sampling. User supplies parametric fate model 
Uncertainty distributions for input uncertainty (currently | 1 Castleton et al. (2006); linked FRAMES with R to calculate & visualize 
module uniform, log-uniform, Normal, or log- impacts of input uncertainty 
Normal) 
TIME User supplies parametric distributions as Rahman et al. (2005); incorporation of a Stochastic Climate Library (SCL) 
above. Some visualization of uncertainty into TIME 
(e.g., confidence limits on outputs). 
SoftIAM http://www.tyndall.ac.uk/sites/ Allows Latin hypercube sampling from 1 Warren et al. (2008); SoftIAM used as an interface to BFG for climate 
default/files/tr5 1 pdf Normal, log-Normal, uniform, triangular modelling 
Beta or Davies* distributions (*specifically 
for risk assessment) 
WADES http://www.ceh.ac.uk/ Work in progress - aims to assess the 
sci_programmes/Water/ relative costs and benefits of OpenMI 
Wades_Project/index.html wrappers for integrated modelling. 
UNCSIM Reichert, 2006 Systems analysis toolbox used to link Arnold et al., (1998); Soil and Water AssessmentTool (SWAT), watershed- 
simulators though text input / output files. scale hydrological /water quality simulation; 
Supports maximum likelihood parameter Hutson and Wagenet,(1991); simulation of nitrogen dynamics in soil. 
estimation & sampling from a variety of 
multivariate distributions. 
DUE (Data Brown and Heuvelink, 2007 Quantification of positional and attribute Refsgaard et al. (2006b); hydrologic river basin modelling, handling changes 
Uncertainty uncertainty in environmental data by of scale 
Engine) probability distributions that take spatial De Bruin et al. (2008); positional uncertainty in agricultural field boundaries 
and temporal correlations into account. Can | for use in precision farming 
also sample from these distributions for 
Monte Carlo uncertainty propagation 
analyses 
Crystal Ball Oracle, (2011) Spreadsheet based 1 Dubus et al. (2002); pesticide models 
@RISK Palisade (2011) 1 Dubus et al. (2002); pesticide models 








1 Rank input parameter contribution to overall uncertainty 





Table 3: Programming tools for development of uncertainty software. 





Tool Reference / website Comments 


Examples of use / case studies 





R http://www.r-project.org/ Open source application with a wide 
selection of statistical /modelling libraries 
including some spatio-temporal functions. 


Langford et al. (2009); assessment of susceptibilities of conservation planning 
algorithms to input uncertainty. Output uncertainties visualised using R plotting 
functions, as statistical summary plots. 

‘Sensitivity’ (G. Pujol - http://cran.r-project.org/web/packages/sensitivity/) .A 
freely-available R package containing a collection of functions for factor 
screening and global sensitivity analysis of model output. 





ModelBuilder (F. Coelho - http://model-builder.sourceforge.net/) - graphical 
tool for simulating models based on ordinary differential equations 





python http://www.python.org/ Another library-based language with many 
mathematical and spatial modules. 
SimLab http://simlab.jrc.ec.europa. | Development framework specifically for 
eu/docs/html/main. html sensitivity/uncertainty analysis - supports 


global methods only 


Le Maire et al,(2011); One of many studies which employ methods 
implemented in SimLab (in this case, the FAST technique) for Monte Carlo 
estimation of uncertainties and parameter effects. 











DAKOTA http://dakota.sandia.gov/ Toolkit which implements numerous Eldred et al (2011); describes methods (incorporated in DAKOTA) for 
index.html algorithms for optimisation, experimental separating and nesting sampling based on epistemic and aleatory uncertainties, 
design and uncertainty quantification. combining local and global gradient-based optimisations. 
JUPITER API http://water.usgs.gov/ Platform for developing model analysis Banta et al. (2008) Description of how the API can be used as a platform for 
software/JupiterApi/ applications (e.g., UCODE, MMA) with fast testing and prototyping, particularly where weighting of prior information 


many built-in algorithms. 











and specification of correlated errors are required. 








5 A proposal for an Uncertainty-Enabled Model Web 


The challenges raised by the transition from isolated data sets and models deployed 
on individual computers, to Web-deployed data sets and models with well defined and 
widely understood interfaces and information models cut across a wide range of 
issues. The management of uncertainty in the Model Web is one of these challenges. 
Within the UncertWeb project a range of tools are being developed to support the 
assessment of uncertainty using expert elicitation (Section 5.3.1), the aggregation and 
disaggregation of spatial and temporal fields (Section 5.3.2), efficient uncertainty and 
sensitivity analysis methods (Section 5.3.3) and the visualisation of uncertain 
variables (Section 5.3.4). These tools are key drivers in promoting uncertainty 
management in the Model Web, and they all use the UncertML standard described 
below for their communication. 


5.1 Representing uncertainty interoperably: UncertML 


UncertWeb adopts a probabilistic approach to representing uncertainty. As a 
conceptual information model for representing probabilistic uncertainty, the 
UncertML* language was devised within the INTAMAP project (Pebesma et al., 
2011) to describe random quantities. Version 1.0 of UncertML was a weak- typed 
design (Williams et al., 2009), with strong dependencies on Geography Markup 
Language (GML) and extensive use of the Sensor Web Enablement (SWE) standards 
(OGC 08-094r1, 2004). UncertML 2.0, released in February 2011 and developed 
within the UncertWeb project, is a simpler hard-typed design. This reduces flexibility 
but allows more complete interoperability, since software providers can actually claim 
their software supports UncertML 2.0. Hard-typing also permits the implementation 
of an Application Programming Interface (API) that supports encoding and decoding 
of XML and JSON documents. 


The conceptual model for UncertML 2.0 is very simple. A basic abstract uncertainty 
type is specialised to create Distributions (probability distribution functions, including 
mixture models for multi-modal distributions), Statistics (summary statistics, such as 
moments), and Samples (realisations of random variables). These types correspond to 
those described in Section 2.2, and allow UncertML 2.0 to represent uncertainty very 
flexibly. Where possible the more complete description of a probability distribution is 
preferred. UncertML consists of a dictionary to precisely define the semantics of the 
uncertainty elements, and can encode both univariate and multivariate random 
quantities. 


Since the inputs and outputs of complex environmental models are usually spatio- 
temporal, a standard way to integrate UncertML with spatio-temporal data is required. 
UncertML separates concerns by focussing purely on uncertainty, and is designed to 
be used with other standards such as Observations and Measurements (O&M - a 
common XML encoding for exchanging observations in the Web) (Cox, 2007), which 
can be used to define the variables being considered, the sampling or model output 
locations etc. The UncertWeb O&M (O&M-U) profile restricts the O&M 
specification to permit only certain geometries and time units. This tight profiling 


3 http://www.uncertml.org/ last accessed 07/05/2011 


solves many of the abstraction problems described in Section 4.1. Uncertainty can be 
added to an O&M document in two ways: (1) uncertainty can be added as additional 
quality information to the result, or (2) the result itself can be encoded as an uncertain 
value. In both cases, UncertML is used to model and encode the uncertainties. 


While O&M is well-suited to observations with spatial vector geometries, grid-based 
observations are better encoded using the Network Common Data Format (NetCDF), 
an established format for exchanging multi-dimensional gridded environmental data. 
Thus an uncertainty-enabled NetCDF profile (NetCDF-U) has also been developed 
within UncertWeb. As a first step, the UncertML dictionary is used to define the 
variables that contain the uncertainty values. In the longer term, basic data types for 
uncertainty will be defined in the common data model on which NetCDF is based. 


UncertML plays a central role in uncertainty-enabling the Model Web. It is the 
primary mechanism for communicating uncertainty between Web Services (which, in 
the Model Web context, act as model interfaces). Existing standards for 
communicating within Web based systems, (for example the Open Geospatial 
Consortium series of standards) already have some support for uncertainty in the 
ISO19139 compliant data quality measures (ISO19139, 2007). However, most of the 
measures that are defined for such quality indicators (ISO19138, 2006; ISO19157, 
2011) are not very generic and relevant only to very specialised domains; for 
example, there is no method for representing a probability distribution. Another issue 
with existing encodings is that most models follow the “result” and associated “result 
quality” pattern. For models with model structure uncertainty there is no notion of a 
unique result; rather the result itself is uncertain. Thus it would not be very natural to 
always encode uncertainty in the result quality, because this begs the question of what 
to put in the result? One option might be the mean, or expected value, but in some 
situations, (for example where the model predicts the outputs to be in two or more 
plausible states) the mean can be a very misleading, and indeed, improbable result. 


UncertML addresses the deficiencies in data quality standards by providing a 
standardized way to encode quantified uncertainty such as probability distributions. 
Using the O&M-U profile, model results with spatial vector geometries can now be 
provided encoded as uncertainties (rather than as values with associated uncertainty) 
or with additional metadata about the model result. Similarly, the NetCDF-U profile 
allows for providing uncertain gridded model results in a standardized and meaningful 
Way. 


5.2 The UncertWeb architecture 


The UncertWeb framework aims to support uncertainty in the discovery, access and 
chaining of data sets and models, while keeping in line with the Model Web 
principles. The design of the UncertWeb framework architecture is based on the 
following principles: 


1) Re-use of existing tools: Several initiatives and projects at national, regional, 
European and global level already provide resources and tools that could be useful for 
the development of an UncertWeb framework. The adoption of a Service-Oriented- 


Architecture (SOA) style (Erl, 2005) allows us to integrate such heterogeneous 
components. 


ii) Extension of the Service-Oriented-Architecture (SOA): During the last decade, the 
SOA style, successfully adopted in different contexts such as e-Business and e- 
Government (IDABC, 2004), has also been adopted in the development of Web-based 
geospatial resource sharing systems (OGC, 2002). However, in the development of 
the global System-of-Systems (SoS), SOA has limitations due to the growing 
complexity of the overall system. Solutions based on the introduction of specific 
components (brokers) which act as mediators will help to lower the entry barrier for 
users (Nativi and Bigagli, 2009); 


iii) Multiple solutions for uncertainty representation: From a conceptual perspective 
all data should be treated as uncertain. However, it must be acknowledged that almost 
all existing data resources are not treated in this way. Most data sets come simply as a 
series of values, often without any uncertainty information. Therefore the UncertWeb 
system architecture needs to accommodate both kinds of representation: a) data sets 
with uncertain values (e.g. expressed as a probability distribution); b) data sets with 
certain values and associated uncertainty information (e.g. expressed as accuracy 
metadata). 


Figure 2 depicts the UncertWeb architecture in terms of high-level entities, which can 
be categorised as four high-level packages, and a broker component; package 
dependency is reported through directed arrows. Packages are as follows:. 


e A GUI package that includes all the components handling user interaction. 

e An Uncertainty Tools package that includes all the components and 
applications for uncertainty management, such as elicitation and visualization. 

e An Available Services package that collects all the services exposed in the 
UncertWeb system. It includes the typical geospatial functionalities: 

o Data View: for presentation and portrayal of data sets; 

o Data Access: for accessing data sets for further evaluation and use; 

o Data Catalog: to register and find data sets based on their metadata; 

o Data Publishing: to provide a persistence layer for data sets and 
results. 

o Data Transformation: to process and manipulate data sets. For the 
UncertWeb purposes, general data transformations are further 
classified as: 

=" Data Processing: information extraction and processing, by 
data set aggregation, operation of models on inputs, etc. 
= Data Conversion: transformation without information 
extraction, for example change of format or change of 
coordinate reference system. 
= Uncertainty Transformation: transforming the representation of 
data set uncertainty. 
e A Data Types package that includes all the specification and tools for 
managing uncertainty-enabled data types. These include profiles such as the 
O&M profile described in Section 5.1. 


e A Composition-as-a-Service (CaaS) component which is controlled through 
the GUI and gives access to all the available services, adopting an extended- 
SOA approach (broker-based mediation). 


Figure 2 Dependency view of the main UncertWeb architectural components. 

Figure 2 shows that the Available Services act upon data sets expressed according to 
the available Data Types. The components in the GUI package may also access the 
Uncertainty Tools. On the other hand the GUI needs to access the services in the 
Available Services package. Due to the heterogeneity of the services in terms of 
interfaces, metadata and data models, a direct link would impose a great complexity 
on the GUI components, limiting their usability and consequently the scalability of 
the overall system. Therefore according to the extended SOA approach, a specific 
service broker component called the CaaS (Composition-as-a-Service) component is 
introduced to mediate the interactions between the user and the services. The two 
main tasks of the CaaS are service composition and the publication of workflows as 
services. 


This high-level view helps to highlight some important points: 
e UncertWeb provides different resources: Tools (e.g. the Uncertainty Tools), 
Services (1.e. the Available Services) and Data (i.e. the Data Types). 
e The CaaS plays a central role, being the component that harmonizes the access 
to the Available Services (and indirectly to the Data Types). 
The architecture is being further developed and will be more completely described in 
future publications. 


5.3 UncertWeb tools 


The UncertWeb framework can only work if the tools needed to analyse how 
uncertainty propagates through model workflows and to communicate and visualise 
the resulting uncertainties have been implemented properly. Since UncertWeb 
currently uses a Monte Carlo simulation approach for uncertainty propagation, all that 
is required for the actual uncertainty propagation analysis is a computational loop 
around the models as described in Section 3. The tools described here will support the 
Model Web’s capabilities in the quantification of uncertainties in inputs and models 
within service chains, the spatio-temporal aggregation and disaggregation of uncertain 
variables, the analysis of uncertainty and stochastic sensitivity, and the 
communication of output uncertainty to end users and decision makers. These tools 
will prove to be critical in achieving impact across the environmental and, more 
broadly, the applied science user communities, providing a suite of services and 
applications to make the uncertainty-enabled Model Web easy to use. 


5.3.1 Expert Elicitation 


Keeping track of uncertainties in service chains implies that the uncertainties about 
the input data submitted to the chain and the uncertainties associated with the models 
used in the chain are known. In many cases these uncertainties can be derived - for 
example, from the precision of measurement devices, goodness-of-fit of regression 
equations or from statistical sampling error - but sometimes the uncertainty must be 


derived from expert judgement. Expert elicitation is a systematic process of 
formalising and quantifying expert judgements about uncertain quantities, typically in 
probabilistic terms. 


Since the first development of structured expert-opinion elicitation by the RAND 
Corporation in the 1940s (Cooke, 1991), formal expert elicitation has gradually 
become a mature research field. Recently, expert elicitation has attracted more 
attention from statisticians and experts in uncertainty analysis (O'Hagan, 1998; Cooke 
and Goossens, 2000; Meyer and Booker, 2001; O’ Hagan et al., 2006). Uncertainty 
about quantities elicited from experts is encoded in the form of a probability 
distribution function. The two statistical frameworks commonly used for this purpose 
are parametric fitting and nonparametric fitting. The former fits expert judgments to 
standard parametric families of distributions and is the method used in UncertWeb. In 
this approach, quantiles of the distribution such as the median and first and third 
quartiles are elicited from the expert using a formal procedure, after which the most 
appropriate shape of the probability distribution is selected automatically and the 
associated parameters are estimated. 


When multiple experts are involved in the elicitation process, a combination of expert 
judgements is needed to utilize knowledge from all experts. Interaction among experts 
is not needed when using mathematical aggregation (O’Hagan et al. 2006). In contrast, 
behavioural aggregation requires some degree of interaction amongst experts. 
UncertWeb approaches experts through the Web, which complicates interaction 
between experts. Hence, a mathematical aggregation of the experts’ opinions is used. 


Figure 3. A screen capture of the Elicitator showing the creation of an elicitation problem. 
Figure 4. A screen capture of the Elicitator showing the expert elicitation interface. 


The implementation of expert elicitation in UncertWeb is provided by the Elicitator™’. 
It largely builds on the existing SHELF methodology developed by Oakley and 

O’ Hagan (2010). One major extension is that the entire process is Web-based. It 
involves a problem owner who defines their problem, provides background 
information and selects experts. Figure 3 shows an example of the initial Webpage 
viewed by the problem owner. Once experts are selected, they are notified and can 
perform the elicitation independently behind their own computer, at any suitable time 
and at their own pace. Figure 4 shows a typical Web-based form that the expert would 
need to fill in. Results are communicated to the expert and provisions for 
reconsidering and changing earlier judgements are provided. Once all experts have 
submitted their opinion these are aggregated by the tool and the resulting probability 
distribution is stored in UncertML (Figure 5). All stages of the problem are recorded, 
so that the lineage of the elicitation is fully accessible to the problem owner, and 
could potentially also be inserted in a published workflow to support reproducible 
science. 


Figure 5. A screen capture of the Elicitator showing the pooling of expert judgments. 


* http://elicitator.uncertweb.org/ last accessed 8/5/11 


The Elicitator facilitates the elicitation of both numerically continuous and categorical 
variables. In addition, it also supports the elicitation of spatially distributed 
continuous variables, by providing a tool to estimate the semivariogram. 


5.3.2 Spatio-temporal aggregation and disaggregation 


Aggregation and disaggregation are common operations or computational components 
inside environmental models. For instance, hydrological models may aggregate rain 
when they compute river discharge from spatio-temporally distributed rainfall values. 
Alternatively, they may predict spatially distributed soil moisture content from 
catchment average (aggregated) precipitation. Outside the modelling context, 
aggregation and disaggregation is required when the spatial, temporal or spatio- 
temporal resolution (or support) of the model input or output does not match the 
resolution required at the next stage of processing. This sort of functionality is 
typically found in model couplers, such as the OASIS3 coupler (Valcke, 2006) but the 
uncertainty introduced by the aggregation/disaggregation process is not estimated by 
these couplers. 


A very common case is that of time series data. When rainfall data are available on a 
daily basis, but a model requires data on a monthly basis, the time series can be 
temporally aggregated. By spatio-temporal aggregation we mean the computation of a 
single value from a set of (spatially or temporally or spatio-temporally) contiguous 
values, for one or more of these sets. The aggregation involves the application of an 
aggregation function, such as the mean, median, maximum, 95-percentile, or variance. 
Spatio-temporal disaggregation is the reverse process: from one or more aggregated 
values, one or more sets of values for smaller spatial, temporal or spatio-temporal 
units are generated. Other words used for these processes are upscaling (aggregation) 
or downscaling (disaggregation). Typically, aggregation is a relatively simple activity 
when a simple function can be applied such as taking the average value over a number 
of grid cells and will reduce uncertainty. More complex forms can involve techniques 
such as block kriging. Disaggregation typically involves more modelling, and requires 
ancillary information about the phenomenon that is not available from the aggregated 
data alone (Bierkens et al., 2000) and will increase the uncertainty in the smaller 
scales. 


Because spatio-temporal aggregation and disaggregation are commonly-required 
activities when model chains are formed, the UncertWeb tools will include a generic 
Web Service for spatial, temporal or spatio-temporal aggregation. It will only work 
with Monte Carlo samples, and for each sample element will aggregate the values to a 
new spatio-temporal resolution (Heuvelink and Pebesma, 1999). Disaggregation will 
be implemented prototypically, for a very limited set of cases, using the area-to-point 
kriging technique. 


5.3.3 Uncertainty and sensitivity analysis 


When models are exposed on the Web in a discoverable manner, users will not 
necessarily be familiar with the detail of the models, and their response to inputs. The 
problem becomes even worse when models are composed in workflows, where they 
might have rather unexpected behaviour, due to the interactions of the different 
components in the workflow, that needs to be characterised and understood by the 


users. One way to address this is by undertaking uncertainty or sensitivity analysis 
(Oakley and O’ Hagan, 2004). 


Uncertainty analysis involves describing the distribution of the outputs given a 
particular distribution on the inputs, which might include some of the inputs being 
fixed, i.e., assumed to be perfectly known. Uncertainty analysis is typically achieved 
using Monte Carlo techniques although screening methods as proposed in Morris 
(1991), or other local methods (Hill and Tiedeman, 2007) can also provide useful 
insights into the model response. Sensitivity analysis involves understanding the 
model’s response to variation in inputs, and can take many forms, including local 
methods based on derivatives (Hill and Tiedeman, 2007) and global methods, based 
on variance (Saltelli et al, 2010). Variance-based sensitivity analysis is generally 
regarded as being more useful, since it allows users to apportion the proportion of 
variance in the output distribution explained by inputs, and their interactions over the 
whole of the realistic input space. It is however necessary to acknowledge that local 
sensitivity analysis methods based on locally linearising the model, while potentially 
susceptible to errors that arise from strong non-linearities in model response can prove 
useful in complex models as part of an exploratory analysis (Campolongo et al., 2007) 
and can be used, for example, to assess the ability of observations to inform 
parameters (Foglia et al., 2009) and predictions (Tiedeman et al., 2004, Water 
Resources Research; Moore and Doherty, 2005 WRR; Tonkin et al., 2007, USGS 
report). 


A problem with the Monte Carlo methods used for variance based sensitivity analysis, 
particularly when applied to models with large run times, is the time required to 
undertake such an analysis (O’Hagan, 2011). One possible means to address 
computational costs is to employ emulator technology (Shahsavani and Grimvall, 
2011). Emulation involves creating a statistical surrogate model of the underlying 
model. The surrogate mode is fast to evaluate and can be used in place of the original 
model as long as the additional ‘emulator uncertainty’ is accounted for in its usage. 
The ongoing MUCM project” has developed emulation techniques so they can be 
applied to a wider range of models although limitations remain (see the MUCM 
toolkit® for details, and references therein). 


At present emulation methods are most effective when a small number of outputs are 
being considered, which are real-valued. Very large numbers of outputs require 
multivariate emulation, which entails describing a complex high dimensional 
conditional probability distribution. Discrete valued variables require further 
development of the emulation theory which is mainly based on Gaussian processes 
(e.g. O’Hagan, 2006) thus assumes a continuously (and often smoothly) varying 
continuous valued output. Typically one considers emulation for a small number of 
summary outputs, which might be some combination of all the model outputs, for 
example the average temperature over a region, or the proportion of a given land use 
type in a given area over a given time period. If it is necessary then it is possible to 
build multi-output emulators (Urban and Fricker, 2010; Conti and O'Hagan, 2010) 
however these are generally rather complex and often it is better to build many very 
accurate individual emulators — these will capture the joint response of the outputs, 


* http://www.mucm.ac.uk/ last accessed 07/05/2011 
°° http://www.mucm.ac.uk/toolkit/ last accessed 07/05/2011 


just not the joint emulator uncertainty on the outputs, which will typically be very 
small by design. 


Building an emulator for a computer model, whether exposed as a workflow, a Web 
Service or a single machine executable is a complex process. There are several steps 
each involving complex judgements, ideally informed by the model owners / builders. 
The main steps in constructing an emulator for a given workflow, also depicted in 
Figure 6, are: 


Elicit input ranges and uncertainties — often uncertainties on model inputs will 
not be known, making it necessary to elicit expert beliefs about the values of 
these inputs. This task is supported by elicitation methods discussed in Section 
Error! Reference source not found.. 

(Optionally) find important inputs — using a process known as screening 
(Morris, 1991), identify inputs that have significant effect on the model 
output(s) of interest. Identifying inconsequential inputs allows the reduction of 
the dimension of the input space, thus having a positive effect on emulator 
complexity and training efficiency. 

Design the training set — with a sampling method such as Latin Hypercube 
(Santner et al., 2003), a set of points to cover the input space is generated. The 
model is then run at these points, producing a training set of input-output 
pairs. 

Train the emulator — an emulator is typically a Gaussian process consisting of 
a mean function, covariance function, and a set of parameters. Training 
typically employs Bayesian inference (O'Hagan, 2006). Once an emulator is 
trained, it can be saved in a portable format such as XML or JSON. 

Validate the emulator — validation is essential to ensure the probabilistic 
judgements represented are correct (Bastos and O'Hagan, 2008). If validation 
results are unacceptable, parameters can be adjusted and training can be 
restarted. 

Use the emulator — the emulator can be used for uncertainty analysis, 
sensitivity analysis, calibration, forecasting and decision making (O'Hagan, 
2006). In the uncertainty-enabled Model Web it will be possible to use this 
emulator as you would any other model component. 


Figure 6. The main steps in constructing an emulator. For each stage, including the optional 
screening step, the methods supported within UncertWeb are listed. 


Several practical implementation issues are addressed when developing a user driven 
tool to construct an emulator for a Model Web component. 


The tool must be able to read descriptions of model inputs and outputs, and 
perform runs for training and validation. For this to be possible without 
requiring specific code, each Web-enabled model must be exposed in a 
standard way. As the tool will be building service requests, this includes any 
inputs and outputs. The service and information model profiles developed 
within UncertWeb aim to facilitate this interoperability. Screening, training 
and validating an emulator requires several hundred runs of the model. If this 
model takes minutes or even hours to compute, it is impractical to require a 
user to keep a web-based program running for this time. Moving this 
responsibility to the server allows the tasks to be run independently of client 


state, but also introduces resource management and reliability issues. 
Mechanisms for asynchronous execution have been developed to queue tasks 
if resources are unavailable, and resume tasks in the event of system failure. 

e Constructing an emulator requires several choices to be made. Some of these 
choices can be set to default values, and some of those default values may be 
changed by expert users. Providing this vast array of options leads to usability 
challenges. A user could be overwhelmed if they are required to make too 
many choices, or an expert user may not feel as though they have enough 
control over the construction process. 


In theory it is possible to greatly speed up the computation of uncertainty and 
sensitivity analysis using emulators, and they are likely to prove particularly effective 
in distributed modelling frameworks where the emulators can be invoked just as any 
other model component, but can also be transported easily as JSON or XML, and run 
almost instantly. Of course not all models will be amenable to emulation, and it is 
envisaged that often it will be more appropriate to emulate a workflow linking several 
model components directly, not just single model instances. Several classes of model 
are currently not amenable to emulation, including models with discrete valued 
outputs, models with discontinuous outputs (with respect to variation in inputs) and 
models with very large numbers of inputs and outputs. Such models require very case 
specific emulation methods which are not currently supported within UncertWeb. 


It is not always clear that construction of an emulator can be justified. Extensive 
examples of the successful utilisation of emulators in modelling studies can be found, 
for example, in the references in Kennedy et al. (2006). However the construction of 
an emulator itself is expensive, requiring many model runs to be made. There will be 
cases where the expense of constructing an emulator cannot be justified, but this will 
also mean that a formal uncertainty or variance based sensitivity analysis will be 
rendered impossible (very large / slow to evaluate model, in which case local methods 
could be considered) or trivial (very small / fast to evaluate model). We do not 
imagine that emulators will be appropriate or required for all workflows, but we do 
anticipate that they will in some cases enhance our ability to characterise and use 
models. 


5.3.4 Visualisation 


Communication and visualisation of results and associated uncertainties produced by 
UncertWeb requires a systematic approach that incorporates contributions from 
cognitive science as well as statistics. A large body of literature and methods are 
available (e.g., Wittenbrink et al., 1996; Pang et al., 1997; MacEachran et al. 2005; 
Kardos et al., 2007; van de Kassteele and Velders, 2006; Garlandini and Fabrikant, 
2009; Wood et al., 2009) and existing state-of-the-art techniques will therefore be 
implemented in the UncertWeb visualisation tool. 


For visualisation of uncertainty it is sensible to distinguish between uncertain 
phenomena that are measured on a continuous-numerical scale (e.g. precipitation, 
concentration of air pollutants, income per head) and those measured on a categorical 
scale, whose groupings may have no numerical meaning or ranking (e.g. soil type, 
land cover, age group). Also, it matters greatly whether the phenomenon varies in 


space, in time, in both space and time, or is constant in space and time (Heuvelink et 
al., 2007). Hence, the various uncertainty visualisations can be conveniently presented 
in a two-dimensional table. The main techniques that will be implemented in 
UncertWeb are given in Table 4. 


For phenomena that vary neither in space nor in time, standard presentation formats 
such as boxplots, pie charts and graphs of the probability distribution (e.g. the jstat 
Javascript library?’ will be used. For spatially distributed phenomena, either static 
displays with adjacent maps or animations of realisations will be provided. The option 
to mask or whiten parts of the study area that are too uncertain will also be provided. 
A very useful option is for Web-based interactive visualisation to allow the user to 
select locations at which visualisation techniques developed for non-spatial and non- 
temporal variables can be applied. Uncertainty in dynamic variables can be displayed 
similarly to uncertainty in spatial variables, with additional possibilities, such as 
displaying multiple realisations against time in a single figure. This method does not 
apply to uncertain spatial variables, but in that case multiple realisations can be shown 
in animation mode. Finally, for space-time phenomena the only feasible options are to 
let users select locations or time points (slices) to which any of the uncertainty 
visualisation techniques for spatial or temporal variables will then be applied. 


Table 4. Types of uncertainty visualisation tools implemented in UncertWeb for combinations of 
measurement scale and space-time variability. 

















I. continuous numerical II. categorical 
A. non- 1. graph of the probability density or 1. graph of probability distribution 
spatial, non- | cumulative distribution (e.g., Figure 5) 2. pie chart, stacked bars, bar chart 
temporal 2. error bar, interquartile range, 
confidence interval, box plot 
B. spatial 1. adjacent maps of the mean and 1. adjacent maps of the category with 
standard deviation;adjacent maps of the | maximum probability and the associated 
lower and upper limits of a confidence probability 
interval 2. map of the category with maximum 
2. maps of multiple realisations (draws probability but masked, whitened or 
from the probability distribution) in one blinking when the probability is below 
frame an (interactive) threshold 
3. masking or whitening of areas with 3. entropy map 
large uncertainty 4. interactive facility to apply techniques 
4. interactive facility to apply techniques | from category A2 at selected point 
from category A1 at selected point locations in map 
locations in map 
5. animations of realisations 
C. temporal | 1. graphs of mean, lower and upper 1. graph of category with maximum 
limits of confidence interval, or error probability but masked, whitened or 
bars against time blinking when the probability is below 
2. multiple realisations plotted in one an (interactive) threshold 
figure 2. graph of entropy 
3. interactive facility to apply techniques | 3. interactive facility to apply techniques 
from category B1 at selected time points | from category B2 at selected time points 
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D. spatio- 1. interactive facility to apply techniques | 1. interactive facility to apply techniques 


temporal from category B1 at time points from category B2 at time points 
2. interactive facility to apply techniques | 2. interactive facility to apply techniques 
from category C1 at point locations in from category C2 at point locations in 
map map 














Overall, the set of tools being developed in UncertWeb and explained in Section 5.3 
describe a minimal requirement for tools that should be available to a modelling 
framework which claims to provide practical uncertainty support. 


6 Discussion and conclusions 


Existing modelling frameworks address uncertainty to variable extents, and there are 
important lessons for the Model Web: for example, easy-to-use tools which clearly 
describe their assumptions and requirements can encourage users to assess, record and 
use uncertainty information at all stages of the modelling process. Reliable 
communication of uncertainty information between diverse models, across 
disciplines, will avoid the bottlenecks where rich and useful uncertainty information 
can be lost to the decision maker. Tools to assist with the visualisation of uncertainty 
can help to communicate this information to a range of different users whose needs 
may be very different (Davis and Keller, 1997). 


Existing modelling frameworks also have varying levels of interoperability, with 
some being strongly specialised to specific domains, or specific types of problem, for 
example time-stepping models in the OpenMI framework. Interoperability itself is a 
complex topic and can in any case be achieved at a number of methods. We would 
argue that one might consider the following levels of interoperability: 


1. Machine encoding interoperability — a common underlying representation of 
basic data values, e.g. big-endian or byte order assumptions, often IEEE 
standards based. 

2. Format encoding interoperability — use of a common data format which 
specifies, for example, header structure or the order of elements, delimiters 
and tags. Examples are NetCDF, GML and O&M application schema, 
shapefile, etc. 

3. Semantic dictionary interoperability - understanding of the meaning of the 
data values, based on semantics / ontologies, for example RDF / OWL. This is 
‘hardwired’ semantics via a dictionary. 

4. Semantic machine interoperability - the real goal of semantic integration 
where machines can 'understand' concepts and reason with them, typically 
without resorting to a central controlled vocabulary. 

5. Information interoperability — here, the relation of the data to reality is 
quantified so that the data can be used appropriately in a given application. At 
present this is little addressed. 


To achieve information interoperability it is necessary to quantify the information (or 
uncertainty) in all aspects of the modelling operation. No existing modelling 
frameworks provide a complete solution to managing uncertainty. We contend that 
information interoperability, i.e., the ability not just to share data and models, but to 
actually base rational decisions and policy on the outcomes from these integrated 


modelling frameworks, requires a rigorous and consistent definition of uncertainty 
and a framework that can manage this from end to end. 


When addressing uncertainty, a probabilistic approach seems most natural (Dawid, 
2004; O’Hagan, 2011) although other approaches such as fuzzy set theory and 
Generalised Likelihood Uncertainty Estimation (Beven and Freer, 2001) are also 
applied. Other coherent frameworks for managing uncertainty, for example Bayes 
Linear (Goldstein and Wooff, 2007), and imprecise probability (Reichert, 1997) also 
deserve attention. These frameworks have attractive features, in that they require 
fewer assumptions to be used (for example Bayes Linear methods work with 
expectation, not full probability distributions), but then enable one to make weaker 
statements as a result (since one only has expectations, including some higher order 
judgements). 


All quantitative approaches also have limitations; ‘unknown unknowns’ will always 
require a qualitative treatment, and in simulations of social systems the issue of 
human choice and free will make modelling particularly challenging, and 
uncertainties still more challenging to quantify. Even for environmental models of 
systems that are reasonably well understood, for example the Earth’s atmosphere, 
obtaining reliable uncertainty estimates for inputs and model structure uncertainty is 
an open research problem. Expert elicitation can assist in the determination of 
subjective uncertainty on unobserved model inputs and more rigorous and precise 
uncertainty estimation for observations can assist in characterising the uncertainty on 
other inputs. As discussed in Section 2.2, careful validation of uncertainty judgements 
should be undertaken whenever possible using appropriate probabilistic methods 
(Gneiting et al., 2007). 


In order for a modelling framework to support probabilistic uncertainty it is necessary 
that: 

e a model for probabilistic uncertainty be defined for communication between all 
components including model and data resources (in this work, this model is 
UncertML) 

e uncertainty should be propagated through model components by an appropriate 
mechanism (typically Monte Carlo) with minimal change to the model 
component; 

e where necessary, conversions between different representations of probabilistic 
uncertainty (e.g. a probability distribution to samples) should be automated; 

e changes of spatial, temporal and spatio-temporal support should be provided 
which also propagate uncertainty; 

To make the framework accessible to a variety of users tools which permit the 
following operations would be beneficial: 

e expert elicitation of uncertain inputs; 

e automated method to assess uncertain inputs where observations exist, based on 
statistical inference; 

e uncertainty and sensitivity analysis; 

e visualisation of uncertain variables across space, time and space-time; 

e probabilistic validation of the outputs of the chains when observations are 
available. 

To address the computational issues it will be necessary to consider parallelism and 
cloud based deployment, and also the use of emulators, or statistical surrogate models, 


which can be deployed easily on Web Services in a semi-automated manner, and can 
be built either for model components or sections of the complete workflow. 


Many computational, theoretical, architectural and user interaction issues remain to be 
addressed before a comprehensive framework for managing uncertainty can become a 
reality. The UncertWeb modelling framework represents an attempt to address many 
of these issues and to push the boundary of what can practically be achieved closer to 
a complete uncertainty management system. Further development of the UncertWeb 
framework could be envisaged to develop tools to assist with inferential (or 
estimation) problems such as data assimilation and model calibration (parameter 
estimation). Many of the generic calibration and uncertainty evaluation tools listed in 
Table 2 provide useful and tested methodologies to enhance uncertainty management, 
and a concerted effort to integrate the most widely used methods from these tools into 
an interoperable architecture such as that of UncertWeb would be most beneficial to 
all. 


Ultimately it seems natural that we should be considering computers whose basic 
types include not only floats, integers etc., but also the equivalent continuous and 
discrete random variables, in their many representations. Ruckdeschel et al. (2006) 
provide an implementation of this idea in the R environment. This would represent an 
ambitious but important paradigm shift, from viewing uncertainty information as 
metadata which is attached to a quantised or estimated value and may be ignored or 
discarded, to using uncertainty itself as the fundamental element for computation and 
modelling. 
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